Is random model better? On its accuracy and efficiency
نویسندگان
چکیده
Inductive learning searches an optimal hypothesis that minimizes a given loss function. It is usually assumed that the simplest hypothesis that fits the data is the best approximate to an optimal hypothesis. Since finding the simplest hypothesis is NP-hard for most representations, we generally employ various heuristics to search its closest match. Computing these heuristics incurs significant cost, making learning inefficient and unscalable for large dataset. In the same time, it is still questionable if the simplest hypothesis is indeed the closest approximate to the optimal model. Recent success of combining multiple models, such as bagging, boosting and meta-learning, has greatly improved the accuracy of the simplest hypothesis, providing a strong argument against the optimality of the simplest hypothesis. However, computing these combined hypotheses incurs significantly higher cost. In this paper, we first advert that as long as the error of a hypothesis on each example is within a range dictated by a given loss function, it can still be optimal. Contrary to common beliefs, we propose a completely random decision tree algorithm that achieves much higher accuracy than the single best hypothesis and is comparable to boosted or bagged multiple best hypotheses. The advantage of multiple random tree is its training efficiency as well as minimal memory requirement.
منابع مشابه
Evaluating the efficiency of bank branches with random data
Data Envelopment Analysis (DEA) is a mathematic technique to evaluate the relative efficiency of a group of homogeneous decision making units (DMUs) with multiple inputs and outputs. The efficiency of each unit is measured based on its distance to the production possibility set (PPS). In this paper, the BCC model is used in output-oriented. The average return on profit as output and the covaria...
متن کاملPerformance evaluation of EPM and MPSIAC Models for determination of Erosion Status of Shahriari Watershed
Soil erosion is one of the most important environmental issues in developing countries, including Iran that there is inaccurate information about its amount and distribution. For this purpose, the accuracy and distribution of erosion classes obtained from EPM and MPSIAC models as compared to BLM as ground truth values were evaluated in Shahriari watershed. First, the required data and informati...
متن کاملUnconditionally Stable Difference Scheme for the Numerical Solution of Nonlinear Rosenau-KdV Equation
In this paper we investigate a nonlinear evolution model described by the Rosenau-KdV equation. We propose a three-level average implicit finite difference scheme for its numerical solutions and prove that this scheme is stable and convergent in the order of O(τ2 + h2). Furthermore we show the existence and uniqueness of numerical solutions. Comparing the numerical results with other methods in...
متن کاملNumerical Simulation of Random Irregular Waves for Wave Generation in Laboratory Flumes
Understanding of wave hydrodynamics and its effects are important for engineers and scientists. Important insights may be gained from laboratory studies. Often the waves are simulated in laboratory flumes do not have the full characteristics of real sea waves. It is then necessary to present reliable methods of wave generation in wave flumes. In this paper, the results of numerically simulate...
متن کاملشبیهسازی مکانی- زمانی بارش سالانه با استفاده از مدلهای تصادفی
Precipitation is one of the most important components of water balance in any region and the development of efficient models for estimating its spatiotemporal distribution is of considerable importance. The goal of the present research was to investigate the efficiency of the first order multiple-site auto regressive model in the estimation of spatiotemporal precipitation in Kurdistan, Iran. Fo...
متن کامل